-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK2: Ensure service endpoints track2 #3885
base: master
Are you sure you want to change the base?
Conversation
/azp run ci,e2e |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a tiny nit.
One question though: how does this address the cluster install failure? |
We have seen some install failures about authorization in this step. We can wrap the step with By updating the old SDK to the new one, we can use built-in retry strategy in the new SDK. Aside from the CIF, we should update old SDKs anyway. |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have seen some install failures about authorization in this step. It should be caused by the AAD propagation delay.
We can wrap the step with
AuthorizationRetryingAction
, but it will retry the whole step again. Because we don't make steps idempotent, retrying the whole step might cause different issues.By updating the old SDK to the new one, we can use built-in retry strategy in the new SDK. This will retry just the failed API, so it's a safer way than
AuthorizationRetryingAction
.Aside from the CIF, we should update old SDKs anyway.
Oh, cool. I didn't realize the new SDK clients had that built-in retry strategy. That makes sense!
Please rebase pull request. |
…nts-track2 # Conflicts: # pkg/cluster/cluster.go
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a quick review, didn't look through unit tests added.
userAssignedIdentities armmsi.UserAssignedIdentitiesClient | ||
|
||
dns dns.Manager | ||
storage storage.Manager | ||
subnet subnet.Manager | ||
subnet subnet.Manager // TODO: use armSubnet instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO should be removed or carded up?
for _, endpoint := range endpoints { | ||
endpointFound, serviceEndpointPtr := subnetContainsEndpoint(subnet, endpoint) | ||
|
||
if !endpointFound || *serviceEndpointPtr.ProvisioningState != armnetwork.ProvisioningStateSucceeded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ProvisioningState ever be in an error state? We wouldn't want to add it then I'm guessing
Which issue this PR addresses:
Fixes https://issues.redhat.com/browse/ARO-4665 and https://issues.redhat.com/browse/ARO-7316
What this PR does / why we need it:
This PR replaces old SDK to the new one.
Test plan for issue:
ci, e2e
Is there any documentation that needs to be updated for this PR?
no
How do you know this will function as expected in production?
Creating a cluster can verify the change.
It can be checked in e2e.